Image Processing

This ipython (sorry, Jupyter) notebook contains the examples that I'll be covering in the 1-dimensional part of my image processing session. There are no external data files to worry about.

I should warn you that this is tested on python 2; I think it'll run on python 3 but I haven't actually checked.

Task 0

Import numpy and matplotlib, and make a simple plot of a sine wave

N.b. I usually start with a piece of boilerplate roughly like:



In [ ]:

    
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# A nice alternative to inline (which allows interaction with the plots, but can be confusing) is
# %matplotlib notebook

Answer 0



In [ ]:

    
x = np.linspace(0, 10, 100)
y = np.sin(x)

plt.plot(x, y, '-o')
plt.ylim(-1.1, 1.1)
plt.xlabel('x')
plt.ylabel('sin(x)')
plt.title('My brilliant plot')
plt.show()

A Toy 1-D Model

We'll make a simple one dimensional model of a star field. Images of real stars are complicated, but we'll assume that the profile is a Gaussian. I write a Gaussian with mean $\mu$ and variance $\sigma^2$ as $N(\mu, \sigma^2)$.

Almost all stars are essentially point sources as viewed from the Earth, so stars look like the Point Spread Function (PSF) produced by a combination of the atmosphere, the telescope, and the detector. There's no standard notation for the PSF, but I always call it $\phi$. The Full Width Half Maximum (FWHM) is a measure of the PSF's width.

In addition to flux from the stars that we care about there's a smooth background of extra light that comes from a number of sources (atmospheric emission, scattering of star and moonlight, dark current in the detector...); for now we'll just treat this as an annoying constant that I'll call `The Sky'.

In reality CCD data is only measured where we have pixels, but under certain conditions (band-limited; satisfying the Nyquist condition) it turns out that the pixellation is unimportant, so we'll ignore it for this session.

Task 1

Write a simulator that simulates noise-free 1-D data. Provide a function with signature

def phi(x, xc=0.0, fwhm=2):
    """Return a numpy array with a star centred at the point xc and FWHM evaluated at the points x"""

Use a Gaussian PSF, and set the sky level to S; for a Gaussian $N(0, \beta^2)$, the FWHM is $2\sqrt{2\ln(2)}\,\beta \sim 2.3548\beta$

Plot a few of your beautiful simulations. Once you've got it working, wrap it up in a function

simulate(x, S=100)

Answer 1



In [ ]:

    
def phi(x, xc=0.0, FWHM=2):
    beta = FWHM/(2*np.sqrt(2*np.log(2)))
    I = np.exp(-0.5*((x - xc)/beta)**2)
    I /= I.sum()
    
    return I
    
x = np.linspace(0, 20, 41, dtype=float)

S = 100

I = S + np.zeros_like(x)
for F, xc in [(500, 7), (2000, 15)]:
    I += F*phi(x, xc)

plt.plot(x, I,'-o')
plt.show()



In [ ]:

    
def simulate(x, S=100):
    I = S + np.zeros_like(x)
    for F, xc in [(500, 7), (2000, 15)]:
        I += F*phi(x, xc)

    return I

x = np.linspace(0, 20, 41, dtype=float)
I = simulate(x)

plt.plot(x, I,'-o')
plt.show()

Task 2

There are lots of sources of noise in astronomical data, but for now let's assume that the only thing that matters is the finite number of photons that we detect. Detecting $n$ photons in a pixel results in a Poisson distribution; if $n$'s mean value is $\mu$, then its variance is also $\mu$. If $\mu \gg 1$, $P(\mu) \sim N(\mu, \mu)$. You can do better than this if you need to; if $\mu > 4$ (!), Anscombe showed that $x \equiv 2\sqrt{x + 3/8}$ is approximately Gaussian, $N(2\sqrt{\mu + 3/8} - 1/(4\sqrt\mu), 1)$ -- but you probably only care if you're looking at the tails of the distribution.

Add noise to your simulation. You can get Poisson variates from numpy by saying e.g.:

mu = 100
print(np.random.poisson(lam=mu))
print(np.random.normal(loc=mu, scale=np.sqrt(mu)))    # Here's how to get a Gaussian approximation

If you want to set the random number seed so that the noise added is always the same, say something like

np.random.seed(666)

If you can't see your stars you might want to make them brighter --- remember that we added a background noise of 10 when we set the sky level to 100



In [ ]:

    
mu = 100
print(np.random.poisson(lam=mu))
print(np.random.normal(loc=mu, scale=np.sqrt(mu)))    # Here's how to get a Gaussian approximation

Answer 2



In [ ]:

    
x = np.linspace(0, 20, 41, dtype=float)
np.random.seed(1000)
sim = simulate(x)
I = np.random.poisson(sim)

plt.plot(x, sim)
plt.errorbar(x, I, I - sim)
plt.show()

Task 3

Let's investigate measuring the flux in a single isolated star. Start my modifying your previous answer to simulate a noisy single star with $\beta = 2$ (a FWHM of c. 5 pixels), centred at x=0, with total flux F0. In reality estimating the sky background is not trivial, but for now let's simply subtract its known value. Plot one of your simulations, with F0=1000 and S=100.

The simplest way to measure the flux in a star is to sum the counts within an aperture. Modify your code to estimate the flux within 5 pixels of the centre, then run a Monte-Carlo simulation to estimate the mean and variance of your estimator.

Your mean should be close to F0 --- if it isn't, take another look at your code. Your value won't be exactly F0; is this a statistical anomaly, or is it something more interesting?

Answer 3



In [ ]:

    
def simulate(x, F0 = 1000, S=100, beta=2.5):
    sim = S + np.zeros_like(x) + F0*phi(x, FWHM=2*np.sqrt(2*np.log(2))*beta)

    if True:    # I'm going to use a Gaussian approximation to the noise
        sim += np.random.normal(0, np.sqrt(S), size=len(sim))
    else:
        sim = np.random.poisson(sim + S)
        
    return sim - S

#np.random.seed(1000)    # uncomment to make your noise realisation repeatable

nSample = 4000
flux = np.empty(nSample)
x = np.linspace(-20, 20, 81, dtype=float)

R = 5
beta, F0 = 2.5, 1000
for i in range(nSample):
    sim = simulate(x, F0, beta=beta)
    flux[i] = np.sum(sim[np.abs(x < R)])

plt.plot(x, sim)
for r in (-R, R):
    plt.axvline(r, ls=':', color='black')
plt.show()

plt.hist(flux, 20)
plt.axvline(F0, ls=':', color='black')
plt.xlabel(r"$F_0$")
plt.ylabel("N")
mean, std = np.mean(flux), np.std(flux)
plt.title(r"R/$\beta$ = %g   %.2f $\pm$ %.2f (spread %.2f)" % 
         (R/beta, mean, std/np.sqrt(len(flux)), std))
plt.show()

Task 4

Package your estimator into a function and estimate the mean and variance as a function of $R$; plot your results. What value of $R$ gives the smallest variance in $F_0$?

For small $R$ are we measuring $F_0$ correctly? If not, make appropriate corrections and remake your plots. Does your conclusion change?



In [ ]:

    
#np.random.seed(1000)

nSample = 4000

def estimateApertureStats(x, R):
    flux = np.empty(nSample)

    for i in range(nSample):
        sim = simulate(x, F0=F0, beta=beta)
        flux[i] = np.sum(sim[np.abs(x < R)])
        
    return np.mean(flux), np.std(flux)

RR = np.arange(1, 16, dtype=float)
mean = np.empty_like(RR)
std = np.empty_like(RR)

x = np.linspace(-20, 20, 81)
for i, R in enumerate(RR):
    mean[i], std[i] = estimateApertureStats(x, R)



In [ ]:

    
plt.errorbar(RR/beta, mean, std)
plt.axhline(F0, ls=':', color='black')
plt.xlabel(r"$R/\beta$")
plt.ylabel(r"$F_0$")
plt.show()

plt.plot(RR/beta, std)
#plt.axhline(F0, ls=':', color='black')
plt.xlabel(r"R/\beta")
plt.ylabel(r"d$F_0$")
plt.show()

We can calculate the `curve of growth' (the variation in the mean with radius) analytically for our PSF.

It's also possible to do this by passing the function phi to a different implementation of curveOfGrowth, but that requires some hacky and/or more advanced python (currying, or more accurately partial function evaluation); so I won't do that.



In [ ]:

    
import scipy.special

def curveOfGrowth(R, beta):
    """Return the curve of growth, evaluated at the points R, for a Gaussian N(0, beta^2) PSF"""
    return 0.5*(1 + scipy.special.erf(R/beta/np.sqrt(2)))

The 0.5*(x[1] - x[0]) is to correct for the location of the sample points.



In [ ]:

    
cog = curveOfGrowth(RR - 0.5*(x[1] - x[0]), beta)

plt.errorbar(RR/beta, mean/cog, std/np.sqrt(nSample))
plt.axhline(F0, ls=':', color='black')
plt.ylim(F0 + 5*np.array([-1, 1]))
plt.xlabel(r"$R/\beta$")
plt.ylabel(r"$F_0$")
plt.show()

plt.plot(RR/beta, std/cog, label='Theory')
plt.plot(RR/beta, std/(mean/F0), label='Empirical')
plt.xlabel(r"R/\beta")
plt.ylabel(r"d$F_0$")
plt.legend(loc='best')
plt.show()

Are those mean values significantly different from F0? We can calculate the reduced $\chi^2$; note that we're not estimating the mean (F0) from the data, so we divide by $N$ not $N - 1$



In [ ]:

    
print("chi^2/nu = %.3f" % (sum(((mean/cog - F0)/(std/cog/np.sqrt(nSample)))**2)/len(RR)))

Task 5

The lecture notes imply that we'd do better to use the PSF to measure the flux. Write a Monte-Carlo simulation, and compare your results with the previous task

Answer 5



In [ ]:

    
nSample = 4000
flux = np.empty(nSample)
filt = phi(x, FWHM=2*np.sqrt(2*np.log(2))*beta)
filt /= np.sum(filt**2)

R = 5
beta, F0 = 2.5, 1000
for i in range(nSample):
    sim = simulate(x, F0, beta=beta)
    flux[i] = np.sum(filt*sim)

plt.plot(x, sim)
plt.plot(x, filt/np.max(filt)*np.max(sim), ls=':', color='black')
plt.show()

plt.hist(flux, 20)
plt.axvline(F0, ls=':', color='black')
plt.xlabel(r"$F_0$")
plt.ylabel("N")
plt.title(r"%.2f $\pm$ %.2f  (%.2f)" % (np.mean(flux), np.std(flux), np.sqrt(S*np.sqrt(4*pi)*beta/(x[1] - x[0]))))
plt.show()